Tempo Induction Using Filterbank Analysis and Tonal Features
نویسندگان
چکیده
This paper presents an algorithm that extracts the tempo of a musical excerpt. The proposed system assumes a constant tempo and deals directly with the audio signal. A sliding window is applied to the signal and two feature classes are extracted. The first class is the log-energy of each band of a mel-scale triangular filterbank, a common feature vector used in various MIR applications. For the second class, a novel feature for the tempo induction task is presented; the strengths of the twelve western musical tones at all octaves are calculated for each audio frame, in a similar fashion with Pitch Class Profile. The timeevolving feature vectors are convolved with a bank of resonators, each resonator corresponding to a target tempo. Then the results of each feature class are combined to give the final output. The algorithm was evaluated on the popular ISMIR 2004 Tempo Induction Evaluation Exchange Dataset. Results demonstrate that the superposition of the different types of features enhance the performance of the algorithm, which is in the current state-of-the-art algorithms of the tempo induction task.
منابع مشابه
centre for digital music Context-dependent beat tracking of musical audio
We present a simple and efficient method for beat tracking of musical audio. With the aim of replicating the human ability of tapping in time to music, we formulate our approach using a two state model. The first state performs tempo induction and tracks tempo changes, while the second maintains contextual continuity within a single tempo hypothesis. Beat times are recovered by passing the outp...
متن کاملTexture features for the reproduction of the perceptual organization of sound
Human categorization of sound seems predominantly based on sound source properties. To estimate these source properties we propose a novel sound analysis method, which separates sound into different sonic textures: tones, pulses, and broadband noises. The audible presence of tones or pulses corresponds to more extended cochleagram patterns than can be expected on the basis of correlations intro...
متن کاملMirex 2011: Audio Tag Classification Using Weighted-vote Nearest Neighbor Classification
In this long abstract, we present an algorithm for automatically annotating music with tags that is fast, scalable and relatively easy to implement. It uses acoustic similarity for propagating tags among audio items. The algorithm makes use of a variety of acoustical features, ranging from spectral features, to rhythm, tonal and highlevel features (such as mood, genre, gender). These features a...
متن کاملUnsupervised Deep Auditory Model Using Stack of Convolutional RBMs for Speech Recognition
Recently, we have proposed an unsupervised filterbank learning model based on Convolutional RBM (ConvRBM). This model is able to learn auditory-like subband filters using speech signals as an input. In this paper, we propose two-layer Unsupervised Deep Auditory Model (UDAM) by stacking two ConvRBMs. The first layer ConvRBM learns filterbank from speech signals and hence, it represents early aud...
متن کاملAudio Visual Speech Enhancement
This thesis presents a novel approach to speech enhancement by exploiting the bimodality of speech production and the correlation that exists between audio and visual speech information. An analysis into the correlation of a range of audio and visual features reveals significant correlation to exist between visual speech features and audio filterbank features. The amount of correlation was also...
متن کامل